14 research outputs found

    wav2letter++: The Fastest Open-source Speech Recognition System

    Full text link
    This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++'s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks

    Libri-Light: A Benchmark for ASR with Limited or No Supervision

    Get PDF
    We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art

    LIBRI-LIGHT: a benchmark for asr with limited or no supervision

    Get PDF
    International audienceWe introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio , which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art. Index Terms-unsupervised and semi-supervised learning , distant supervision, dataset, zero-and low resource ASR

    Twenty years of coordination technologies: State-of-the-art and perspectives

    Get PDF
    Since complexity of inter- and intra-systems interactions is steadily increasing in modern application scenarios (e.g., the IoT), coordination technologies are required to take a crucial step towards maturity. In this paper we look back at the history of the COORDINATION conference in order to shed light on the current status of the coordination technologies there proposed throughout the years, in an attempt to understand success stories, limitations, and possibly reveal the gap between actual technologies, theoretical models, and novel application needs

    Modeling and programming social collaboration

    No full text
    Abweichender Titel nach Ãœbersetzung der Verfasserin/des VerfassersA whole is greater than the sum of its parts. A collaborating team is greater than a group of contributors working in isolation. In this thesis we introduce a novel technique called collaboration-assisted computation that evolves human-assisted computation in line with these postulates. As human computation focuses on integrating human input at various phases of machine computation, so collaboration-assisted computation aims at integrating machine computation with input from collaborating teams. However, collaboration-assisted computation is something more than a simple replacement of the term human input with the term team input in the pipeline of machine computation. What is collaboration without social interaction? How effective can collaboration be without convenient software tools? While the answers to these questions lie outside of the scope of this thesis, we argue that a truly efficient collaboration orbits around social context and collaborative software. Therefore, the center of gravity for collaboration-assisted computation lies at the intersection of human computation, social computing and collaborative software. Moreover, collaboration-assisted computing relies on crowdsourcing to execute collaboration at massive scale. Hence, this thesis presents a holistic framework for modeling and programming collaboration-assisted computation. First, we present a query language capable to express intuitively complex social traits of collaborating groups. Second, we show how to model social collaboration processes. Third, the thesis introduces a programming language to coordinate collaborative teams and a framework for integration of social and collaborative software. Fourth, we show how crowdsourcing models can be extended to scale collaboration processes. The proposed modeling and programming languages were evaluated with extensive use cases, showing intuitiveness and expressiveness of each of the approaches.10
    corecore